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Abstract 

Iterative majorize-minimize (MM) (also called optimization transfer) algorithms solve challenging numerical optimization 
problems by solving a series of “easier” optimization problems that are constructed to guarantee monotonic descent of the cost 
function. Many MM algorithms replace a computationally expensive Hessian matrix with another more computationally convenient 
majorizing matrix. These majorizing matrices are often generated using various matrix inequalities, and consequently the set of 
available majorizers is limited to structures for which these matrix inequalities can be efficiently applied. In this paper, we present 
a technique to algorithmically design matrix majorizers with wide varieties of structures. We use a novel duality-based approach to 
avoid the high computational and memory costs of standard semidefinite programming techniques. We present some preliminary 
results for 2D X-ray CT reconstruction that indicate these more exotic regularizers may significantly accelerate MM algorithms. 


I. Introduction 

Given aniVxiV Hermitian matrix H, we say that the Hermitian matrix M majorizes H if none of the eigenvalues of 
M-H are negative. Matrix majorizers are central to majorize-minimize algorithms and are ubiquitous in image processing 
algorithms 0, 0, (15), £T) Better majorizers can significantly affect how quickly algorithms converge HU, HD, but 
majorizers are usually designed by hand on a per-problem basis. Algorithmic majorizer design expands the class of usable 
majorizers and reveals more effective majorizers, but a straightforward semidefinite programming approach requires too much 
memory to be computationally feasible for many practical imaging problems. 

This paper presents an algorithmic approach to designing a majorizing matrix M for a general given Hermitian H. The 
algorithm we present has relatively low memory requirements, and it is practical for large problems where storing and 
manipulating dense N x N matrices would be infeasible. The goal of this chapter is to enable the design of more exotic 
majorizers than are currently accessible using various inequalities. This expanded class of majorizers may contain tighter 
majorizers (6j that lead to faster convergence j7]]. 

Conventionally, a majorizing matrix M ^ H is found using a collection of inequalities and matrix properties. A simple and 
common bound is M L i psc hit z = A max (H)I, which is often used in optimization algorithms for which the cost function gradient 
is Lipschitz continuous (with Lipschitz constant A max (H)). This choice often is a very loose bound. Often a tighter bound is 
the diagonal matrix 

Msqs = diag 

3 

We call this the “separable quadratic surrogates” (SQS) majorizer due to its ubituity in ordered subsets with SQS (OS-SQS) 
algorithms (TJ. If H contains only nonnegative entries (e.g., A T WA, the Hessian of the X-ray CT data-fit term [ 11, [23]) then 
Msqs can be quickly computed via 

M SQS =diag{[Hl] i }. (2) 

Our experiments (not shown) suggest that M S qs is a fairly tight diagonal majorizer when H contains only nonnegative entries, 
and its ease of computation ^ makes it a very useful tool. However, when H contains negative entries, Msqs appears to be 
less tight. In this case, carefully designed majorizers that exploit the structure of H, e.g., gd, can significantly improve on 
M S qs- 

This paper focuses on designing majorizers of the following form: 

M = K h DK, (3) 

where D € R Kx K is a diagonal matrix and K £ C KxN , K > N has full column rank. We assume the matrix K is selected 
by the MM algorithm designer beforehand and focus on designing D. A special case is K = I, which leads to a diagonal 
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majorizer. Our goal is to select the diagonal matrix D such that M is a majorizer of H and such that the values of D are “as 
small as possible” so that the majorizer is “tight” or “sharp ” To quantify this goal, we pose the following convex optimization 
problem: 

M = K t DK, where (4) 

D= argmin l||d||^, (5) 

D:K T DK>H A 

with positive diagonal weighting matrix W. The vector d G R N is the diagonal D. 


II. Methods 

Let H G C NxN be a given Hermitian positive semidefinite matrix, and let K G C KxN be a given matrix with full column 
rank. We aim to majorize H with the matrix K H DK, where D solves 


D= argmin -||d| 

D:K H DK>H 2 


2 

W- 


( 6 ) 


The vector d G R K is the diagonal of D, and W G R KxK is a given positive-definite weighting matrix. This is a convex 
minimization problem over a convex set, although the domain Q, 

n = {d : K h DK >z H} (7) 

is challenging to efficiently characterize. The majorizer design problem we pose ([6]) could be solved using a semidefinite 
programming (SDP) technique. However, to the best of our knowledge, algorithms to solve ([6]) using SDP involve storing and 
manipulating dense N x N matrices © © In practical image processing problems, N is the number of pixels in the image 
and can be very large, so storing arbitrary N x N matrices is infeasible. 

We rewrite the majorizer design problem © using the characteristic function lq as 

D = argmin l||d||^+ in(d). (8) 

d£R K Z 


The characteristic function in(d) is zero when d G H (i.e., K H DK majorizes H) and infinite otherwise. Instead of directly 
approaching we rewrite the characteristic function lq using a generalized convex conjugate GT 


<-n(d) = sup z H (H — K h DK)z 

(9) 

zec N 


= sup z h Hz - (Kz) h D(Kz) 

(10) 

z£C n 


= sup z h Hz — d T Kz 2 , 

(ID 

z£C n 


where Kz 2 contains the elementwise squared complex moduli of Kz: 


| Kz | 2 = ve C {|[ K z] fc | 2 }. 

(12) 

Substituting (pTj) in ^ yields the following min-max problem for designing D: 


D = argmin sup S(d,z), 

(13) 

deR K zec N 


S(d,z) = i||d||^-d T |Kz| 2 + z H Hz. 

(14) 


Reversing the order of the “argmin” and “sup” in (\3\ and solving for the minimizing d in terms of z yields the dual problem 


z = argmax L(z) 

zec N 

1 o 2 

L( z ) = —- IKzl- 


w 1 


where 
+ z h Hz. 


We solve this dual problem to recover the the optimal primal variable D. This requires the following two results: 
• Strong duality, i.e., 


(15) 

(16) 


Appendix [A] provides a proof. 


inf sup S(d,z)= sup inf S(d,z). 

deR« zeC N zec^dER^ 


(17) 
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A way to recover the primal variable D from a dual solution z. Appendix [b] shows 


d = W~ :1 |Kz|", 


(18) 


where z solves the dual problem. 

We have found a dual problem for the majorizer design problem ([6]). Unlike the original problem, the dual problem does not 
directly use the difficult-to-characterize domain U and can be approached without memory- and computationally-expensive 
SDP techniques. The drawback of this transformation is that the dual problem ( p~5] ) is not concave. Consequently, the steepest 
ascent algorithm in Section [irA] may converge to a local optimum. Nonetheless these suboptimal solutions may be useful, and 
Section [TUB] discusses modifications of these local optima to provide a matrix that majorizes H. 


A. Dual steepest ascent 


We find a local maximizer of the dual function L using steepest ascent. Initialize z(°) / 0. We compute the search g( n ) 
direction using the gradient of L(z^ n ^): 



g (n) = VL(z (n) ) 

(19) 



= 2(Hz - K h W“ 1 |Kz| 2 © Kz), 

(20) 

where © is element-wise multiplication. Maximizing the dual function L along this search direction involves solving the 
following line search problem: 



Q/( n ) = argmax f( n \a), where 

aGl 

(21) 



/ (n) (a) = L(z< n 4ag (n) ). 

(22) 

Setting (/("))'(<*) = 0 yields 





0 = c^o? + c 2 a ? + c\ol + Co, where 

(23) 



c 3 = 2vJW _1 v 2 

(24) 



c 2 = 2vJW -1 vi 

(25) 



ci = 2vJW _1 v 0 + v]"W -1 vi — 2 fr 2 

(26) 



1 

o 

1 

£ 

h y 

ii 

o 

(27) 

v 2 = 

, x|2 

KgW 

b2 = (g (n) ) H Hg(”) 

r / \ h i 

(28) 

vi = 5 

2 ■ Real |Kg ( ’ 

l) © KzW | bi= 2 • Real< fg (n) J Hz (n) ^ 

(29) 

v 0 = 

kzM 2 . 


(30) 


The root-finding problem ( [23] ) can be solved using an off-the-shelf routine, e.g., roots in the software package octave. 
We loop over the real roots of (/( n )) ( a ) and choose the step length a^ that maximizes the dual function. 

Each iteration of the proposed algorithm requires one multiplication by H and one multiplication by K and K H , and the 
algorithm stores only length -N and -K vectors. Thus the algorithm is practical even for large-scale problems like CT image 
reconstruction where one must compute Hz on the fly. This procedure will find a local maximizer of the dual function; the 
next section discusses how to manipulate the resulting approximate solution to produce a majorizer of H. 


B. Ensuring majorization 

The dual problem is not concave, so the steepest ascent algorithm in this section will not in general converge to a 
global maximizer of the dual function L. This means that the induced matrix, 

K H DK = K H W 1 diag|[|Kz| 2 j |k, (31) 


may not majorize H. To compensate for this possible suboptimality, we scale the resulting matrix: 

M = aK T DK >- H, where 


a e 


A n 


(k H DK) 2 H(k H DK) 2 ),3 


(32) 

(33) 
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(a) F 


(b) H 


Fig. 1: The nonnegative matrices F and H from the experiment in Section 
our proposed algorithm. 


III-A 


(c) JViDesign-Circ+Diag 

and the matrix MDesign-circ+Diag produced by 


This results in a majorizer for H; see Appendix jcjfor a proof. If ^K H DKj : is easily computed, then power iteration can 
be used to find the optimal (minimal) of a. This will be the case when e.g., K is a unitary matrix. If K is more complicated, 
power iteration may be undesirably computationally expensive, and we instead use the looser scaling a = 3. 


III. Preliminary experiments 
A. Weighted Toeplitz matrices and circulant majorizers 

We generated the N x N weighted Toeplitz matrix F with N = 512 and entries 

0.1+COS 2 (27T^) 

[ J « yTT J^J\ 

and set H = F T F. This choice is inspired by the 1/r-like response of the CT system matrix |5|. Figures 
and H, respectively. We generated three diagonal majorizers: 

• -M-Lipschitz = ^max(H)I, 

• Msqs using ([2]), and 

• Moesign-Diag, using the algorithing proposed in this chapter with K = I. 

We also generated M Des i gn _ C irc+Diag> a combination of circulant and diagonal matrices, using the proposed algorithm with 


la 


and 


lb 


(34) 
show F 


Udft 

I 


K-Circ+Diag 

Finally, we computed Mci rc 

M C irc = P C, 

where C is the best circulant approximation to F in the Frobenius-norm sense 

C = argmin ||C —H||p, 

C circulant 

and P is chosen with power iteration so PC >z H. 

We used each of the majorizers to solve the quadratic minimization problem 

x = argmin -x T Hx + x T g, 

x 2 


§, 


(35) 


(36) 


(37) 


(38) 


with g and x^ 0 ^ initialized with zero-mean normal random values. We performed the following simple majorize-minimize 
(MM) procedure: 


X++ 1 ) = x(") - M _1 (Hx (n) + g). 


(39) 


This experiment explores the relative accelerations that different majorizers provide. To solve ( [38] ) even faster with a majorize- 
minimize algorithm, we would also use some first-order acceleration scheme flO} , p6| , |17[. Figure [2a| shows how quickly 
the MM algorithm iterates {x^ n ^} converged to the solution of (38) as a function of iteration with each of the majorizers, 
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(a) Distance to solution 


(b) Majorized spectra 


Fig. 2: Convergence plots and majorized spectra for the small Toeplitz experiment in Section 


III-A 


The partially circulant 


majorizer MDesign-circ+Diag acts like both a majorizer and a preconditioner, accelerating convergence of the simple majorize- 
minimize algorithm. The ideal majorizer inverts the matrix H and produces a uniform majorized spectrum with value 1. 


and Figure |2b| shows the eigenvalues of for each majorizer M. For fast convergence, ideally those eigenvalues 

would be near 1 (7). 

The designed diagonal majorizer underperforms the more conventional SQS majorizer, Msqs- This is possibly due to the 
proposed algorithm producing a suboptimal solution to the majorizer design dual problem. Regardless, diagonal majorizers are 
not our primary interest here: the proposed algorithm is more useful for generating majorizers with more exotic structures. 

Except for edge conditions, circulant matrices and Toeplitz matrices are very similar. Because Mci rc and MDesign-Circ+Diag 
contain circulant terms, they can approximate H while majorizing it. When these matrices are inverted in the majorize-minimize 
procedure ( [39] ), they act as both preconditioners and majorizers. The preconditioning effect appears in better-conditioned spectra 
(i.e., closer to unform Is) for M C i rc and M Des i gn _ C irc+Diag in Figure 2b and in faster convergence rates in Figure 2a Because 
the majorizer design algorithm in this chapter can generate a majorizer with both circulant and diagonal components, it can 
capture the nonuniform weighting in H better than M c i rc . This results in further improvement over M c i rc . 


B. X-ray CT reconstruction 

Consider the following unconstrained X-ray CT reconstruction problem: 

x = argmin ]- | | Ax - y 1| ^ + R(x) (40) 

X 2 

with X-ray CT system matrix A, noisy measurements y, diagonal matrix of positive statistical weights W and edge-preserving 
regularizer R [^3). In this experiment, we assume the edge-preserving regularizer R is differentiable. The statistical weights 
W contain patient-specific data that one can separate from the large CT system matrix with variable splitting: 

1 2 

x = argmin -1|u — y|| w + R(x) such that u = Ax. (41) 

x 2 


Applying the alternating directions method of multipliers (ADMM) to this constrained problem yields the following set of 
iterated updates (18), (20) with positive-definite penalty matrix Y and dual variable rj u \ 

u (n+1) = (W - 


x (n+l) _ ar g m j n 


J-^Wy + rfAxW+^W)) 

Ax - u (n+1) 


1 


Vu 


(n) 


R( x ) 


r 7u ( fl + 1 ) = r]J n) + Ax( n+1 ) - u (n+1) . 


(42) 

(43) 

(44) 


We can choose F to make the u update (42) easy, and the dual variable update (44) is trivial. For this experiment, we follow 
the guidance in (20j and set 

r = medianjr/!,} • I. (45) 

i 

The only challenging operation in this algorithm is the x update ( [43] ) involving the computationally expensive CT system 
matrix A. 
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(a) x (0) 


(b) x 


( 64 ) 

Down 


Fig. 3: Initial image from filtered backprojection (3a) and the output of the ADMM algorithm with majorizer MDown after 64 
iterations d3b). 
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(a) Cost function vs. iteration 


(b) Cost function vs. time 


Fig. 4: Per-iteration (4a) and per-time (4b) cost function values 

-M-Circ and ]V[j)own* 


for the ADMM algorithm using the three majorizers Msqs, 


Let M >: A T rA be a majorizer for the quadratic term in Hessian of the x update (43). Instead of solving (43) exactly, we 
majorize the quadratic term and descend the surrogate function 

.1, , , 2 / / x \ T 


(n+1) 


argmm 


x - x ("> 


M 


(x-x (n) ) A T r(Ax ( 


n) _ u (™+ 1 ) 


Vu 


(n) 


) + R(x) 


(46) 


using five iterations of conjugate gradients. 

We simulated a noisy 2D fan-beam scan of an XCAT (22) phantom with an 888-channel detector and 984 views, and then 
reconstructed images onto a 512 x 512-pixel grid. Figure |3a| shows the intial image from filtered backprojection. 

We ran the ADMM algorithm (|42|)-([44|) with the following majorizers M for the x update: 

• the diagonal majorizer Msqs 0; 

• a circulant majorizer M C i rc found by the proposed majorizer design algorithm with K = Udft; and 
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• a majorizer using a “downsampled” version of the CT Gram matrix A T A, MDown- We set 


K 


Down — 


Ar> own 

I 


(47) 


and used the proposed majorizer design algorithm, so M Down was a sum of a diagonal matrix and an approximation to 
the CT Gram matrix. For this experiment, Aoown was a fan-beam CT system matrix that matched A except it had 82 
(instead of 984) views and 128 (instead of 888) channels. The downsampled A Down system had the same spatial coverage 
as the full system A but lower computational cost. 


We ran the majorizer design algorithm for 128 iterations for both M C ir C and M Down , and used the looser a = 3 scaling to 
ensure majorization (see Section B- 

All algorithms were run on a machine with a 12-core Intel Xeon processor, and multiplications with the system matrix A 
were performed using multithreaded C code. Figure [3b] shows the output of ADMM algorithm using M Down after 64 iterations. 
This is a preliminary experiment, and we do not yet have an essentially converged reference image to which to compare the 
tested algorithms, so we use the cost function at each iteration instead. Figures [4a] and [4b] show how quickly each algorithm 
descends the cost function (40) as functions of iteration and time, respectively. 

Because the designed majorizers Mci rc and Moown capture more of the structure of the Hessian H = A T A, they serve as 
both majorizers and preconditioners in the x update ( [43] ). This allows the ADMM algorithm using these majorizers to converge 
more rapidly than when using the less-structured diagonal majorizer M S qs- Thanks to the FFT, the circulant majorizer Ma rc is 
only marginally more computationally expensive than Moiag- On the other hand, MDown uses an approximation to the geometry 
of the original CT system matrix that makes it less computationally expensive than A T A. This allows M Down to capture more 
of A t A’s structure than Mci rc > leading to even faster convergence. While one could design a circulant majorizer for A T A 
using its point spread function and power iteration, it is less clear how one would design a majorizer like M Down without an 
algorithm like the one proposed in this paper. 

The ADMM-based reconstruction algorithms in this section may not converge quickly in absolute terms, especially compared 
to contemporary fast CT reconstruction algorithms m-m (18), but this experiment does show the benefits from using more 
sophisticated majorizers. Each of (l2)-(l4|, (I8] l use diagonal SQS-like majorizers, and they may benefit from algorithmically 
designed majorizers. 


IV. Conclusions and future work 

In this paper, we proposed a new way to design matrix majorizers. Our algorithm uses a duality-based approach to avoid 
relying on memory- and computation-intensive semidefinite programming (SPD) techniques. We proposed a simple steepest 
ascent algorithm to find a local maximum of the nonconcave dual problem, and we showed how to manipulate suboptimal 
solutions to guarantee a majorizer. In two preliminary experiments, we demonstrated the usefulness of the algorithmically 
designed majorizers. By capturing more of the structure of the majorized matrix, algorithmically designed majorizers with 
appropriately chosen structures can yield significant acceleration. 

The experiments in this paper are preliminary, and future work will present comparisons with state-of-the-art image recon¬ 
struction algorithms. The majorizer structure in this paper can also be used to design so-called block-separable surrogates |9) 
that may useful in distributed computing. 
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Appendix A 
Strong duality 

Let H G C NxN y 0 be a given positive semidefinite matrix, K G C KxN , K > N have full column rank and define Cl to 
be 


Consider the function 


The primal function is 


and the dual function is 


Cl = {d : K h DK >: H}. 

S(d,z) = l||d||^-d T |Kz| 2 + z H Hz. 

J(d) = sup S(d, z) 

zec N 

= 2 11^1 lw ^u(d). 


(48) 

(49) 

(50) 

(51) 


L(z) = inf S(d,z) 

deR K 


|Kz|- 


z h Hz. 


w- 


In this section we show the minimum of the primal function is equal to the maximum of the dual function: 


(52) 

(53) 


p = min J(d) = sup L(z) = d. (54) 

dem K zeC N 

Because J is a strongly convex function and Cl is a convex set, there exists a unique minimizer of J over Cl. Let d be this 
minimizer: 


d = argmin J(d). (55) 

deu 

Because H ^ 0 the unconstrained minimizer of ^||d||^, d unC onstrained = 0, does not lie in Cl. Therefore, d is on the boundary 
of Cl and — VJ (d'j = Wd is normal to Cl. 

We can characterize the feasible set Cl as an intersection of half-spaces: 

Cl = p| {d : z H (K h DK - H)z > 0} (56) 

z 

= Pi {d : d T |Kz| 2 >z h Hz}. (57) 

Z 

Because — VJ^d^ = Wd is normal to Q and on the boundary of Q, there exists an z for which one of the above inequalities 
holds with equality. That is, 


-Vj(d) = Wd = a|Kz| 2 


(58) 
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and 


z h Hz = d T |Kz| 2 =a(|Kz| 2 ) T W- 1 |Kz| : 


a 


|Kz| 


w 


We use (58) to find the minimum of the primal function: 


'( a ) = 5 


w 


_ 1 
“ 2 
ce" 
’ ~2 
= P • 


aW _1 |Kz 


w 


IKzl 


W- 


(59) 

(60) 

(61) 

(62) 

(63) 


It is widely known that p > d. Therefore, to show p = d it suffices to show that L(z) = p for some z. We try L(/3z), with 


p 2 = a: 


L(/?z) 


1 ,_,2 2 


via 


2 

2 


P 4 


a 

~ ~2 
= P, 


|/3Kz| 


IKzl 


IKzl 


IKzl 


+ /3 2 z t Hz 


w 

2 


/? 2 z t Hz 


w- 1 


w 1 


+ P 2 a 


IKzl 


w- 1 


w- 


(64) 

(65) 

( 66 ) 

(67) 

( 68 ) 


completing the proof. 


Appendix B 

Equivalence of majorizers from primal and dual problems 
In this paper, instead of solving the primal problem 


1 2 

D = argmin -||d|| w + ^(d) 

deM K z 

(69) 

= argmin sup ^ d ^ + z h Hz - d T Kz 2 

deR K zec N z 

(70) 

= argmin sup S(d,z), 

deR K zec N 

we reverse the order of the minimization and maximization and solve the dual problem 

(71) 

z = argmax inf S(d,z). 

zec N 

In this section, we prove that the primal solution 

(72) 

D p = argmin sup S(d,z) 

deR K zec N 

and the solution induced by solving the dual problem 

(73) 

T> d = argmin S(d,z), 

deR K 

where z solves the dual problem (72), are equal. 

(74) 

Let d be the maximum value attained by the dual function at z and p be the minimum value attained by the primal function 
at d: 


d= sup inf S(d,z), (75) 

ze c N 

p= inf sup S(d, z). 

deR K ze c N 


(76) 
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We proceed by contradiction. Assume that D p ^ D d . Since S(d, z) is a strongly convex function of d 


d= sup inf S(d,z) (77) 

zE C N 

= S(d d ,z) (78) 

< S(d p ,z) (79) 

< S sup (d p ,z) (80) 

z£C n 

= p. (81) 


That is, p d. This contradicts the strong duality result in Section |A| and we conclude that D = D p = D^. Now we can 
write the primal solution D in terms of dual solution z: 


D = argmin S(d, z) (82) 

deR K 

= argmin -||d||^ - d T |Kd| 2 + z h Hz (83) 

deR K 2 

= W _1 |Kd| 2 . (84) 


Appendix C 

Scaling for majorization 

Let z be a local maximum of the nonconcave dual function L(z) found by an iterative gradient-based method. Because z is 
an attractor of the maximization procedure and L is smooth, L is concave at z. That is, the Hessian of L at z is nonpositive 
definite: 


Rearranging, 


V 2 L(z) = —6K h W -1 diag { [|Ki| 2 ] }k + 2H ^ 0. 
3K h W -1 diag ( |Kz| 2 ] )k ^ H. 


(85) 


( 86 ) 


Even though we do not find the global maximum of L using the steepest ascent procedure in Section II-A| simply scaling the 
majorizer produced by a local maximum by a factor of 3 produces a majorizer for H. 







